By the end of this tutorial, you will be able to:
LiDAR (Light Detection and Ranging) is an active remote sensing technology that measures distances by illuminating targets with laser pulses and analyzing the reflected light. Airborne Laser Scanning (ALS) systems mounted on aircraft or drones can collect millions of precise 3D points representing the Earth’s surface and features on it.
The lidR package is a comprehensive R toolkit for airborne LiDAR data manipulation and visualization. Created by Jean-Romain Roussel, it provides tools for:
Documentation: https://r-lidar.github.io/lidRbook/
You’ll work with Airborne Laser Scanning (ALS) data from Aarhus University campus in Denmark. The data is in LAZ format (compressed LAS), which is a standard format for storing LiDAR point clouds.
Data Source: Danish Agency for Data Supply and
Infrastructure
Coverage: University Park, Aarhus (1 km × 1 km
tile)
Free Download: https://dataforsyningen.dk/data/3931
Question 1: What is the difference between LAS and LAZ file formats?
- LAS is the raw, uncompressed LiDAR point cloud format. - LAZ is a lossless, compressed version of LAS - They contain exactly the same data, but LAZ is much smaller
The readLAS() function loads LiDAR data into R as a LAS
object.
## [ ] -2147483648% ETA: -2147483648s [ ] -2147483648% ETA: -2147483648s [ ] -2147483648% ETA: -2147483648s [ ] -2147483648% ETA: -2147483648s [ ] -2147483648% ETA: -2147483648s [ ] -2147483648% ETA: -2147483648s
LAS objects have two main components: - @header: Metadata about the point cloud (extent, CRS, point counts, etc.) - @data: The actual point data (X, Y, Z coordinates and attributes)
## File signature: LASF
## File source ID: 0
## Global encoding:
## - GPS Time Type: GPS Week Time
## - Synthetic Return Numbers: no
## - Well Know Text: CRS is GeoTIFF
## - Aggregate Model: false
## Project ID - GUID: 00000000-0000-0000-0000-000000000000
## Version: 1.4
## System identifier:
## Generating software:
## File creation d/y: 0/0
## header size: 375
## Offset to point data: 1005
## Num. var. length record: 1
## Point data format: 7
## Point data record length: 48
## Num. of point records: 1934412
## Num. of points by return: 1934412 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## Scale factor X Y Z: 0.01 0.01 0.01
## Offset X Y Z: 0 0 0
## min X Y Z: 574448.3 6225346 -1.37
## max X Y Z: 574930.6 6225733 92.88
## Variable Length Records (VLR):
## Variable Length Record 1 of 1
## Description: by LAStools of rapidlasso GmbH
## Extra Bytes Description:
## Blue: additional attributes
## Green: additional attributes
## Red: additional attributes
## Extended Variable Length Records (EVLR): void
## X Y Z gpstime Intensity ReturnNumber
## <num> <num> <num> <num> <int> <int>
## 1: 574642.9 6225346 37.91 111077394 44 1
## 2: 574643.5 6225346 37.86 111077394 25 1
## 3: 574644.1 6225346 37.75 111077394 39 1
## 4: 574645.4 6225346 37.45 111077394 31 1
## 5: 574645.9 6225346 37.45 111077394 21 1
## ---
## 1934408: 574473.3 6225504 44.59 111078674 136 1
## 1934409: 574472.8 6225504 44.59 111078674 143 1
## 1934410: 574472.1 6225504 44.61 111078674 138 1
## 1934411: 574472.9 6225504 44.56 111078674 152 1
## 1934412: 574472.3 6225504 44.58 111078674 123 1
## NumberOfReturns ScanDirectionFlag EdgeOfFlightline Classification
## <int> <int> <int> <int>
## 1: 1 1 0 2
## 2: 1 1 0 2
## 3: 1 1 0 2
## 4: 1 1 0 2
## 5: 1 1 0 2
## ---
## 1934408: 1 1 0 2
## 1934409: 1 1 0 2
## 1934410: 1 1 1 2
## 1934411: 1 1 0 2
## 1934412: 1 1 1 2
## ScannerChannel Synthetic_flag Keypoint_flag Withheld_flag Overlap_flag
## <int> <lgcl> <lgcl> <lgcl> <lgcl>
## 1: 0 FALSE FALSE FALSE FALSE
## 2: 0 FALSE FALSE FALSE FALSE
## 3: 0 FALSE FALSE FALSE FALSE
## 4: 0 FALSE FALSE FALSE FALSE
## 5: 0 FALSE FALSE FALSE FALSE
## ---
## 1934408: 0 FALSE FALSE FALSE FALSE
## 1934409: 0 FALSE FALSE FALSE FALSE
## 1934410: 0 FALSE FALSE FALSE FALSE
## 1934411: 0 FALSE FALSE FALSE FALSE
## 1934412: 0 FALSE FALSE FALSE FALSE
## ScanAngle UserData PointSourceID R G B Blue Green Red
## <num> <int> <int> <int> <int> <int> <num> <num> <num>
## 1: 0 2 45063 38400 35584 34560 34560 35584 38400
## 2: 0 2 45063 24320 21504 20480 20480 21504 24320
## 3: 0 2 45063 17664 14848 13824 13824 14848 17664
## 4: 0 2 45063 44288 43264 40960 40960 43264 44288
## 5: 0 2 45063 45056 43776 42240 42240 43776 45056
## ---
## 1934408: 0 2 45064 15360 15872 13056 13056 15872 15360
## 1934409: 0 2 45064 15616 16128 13312 13312 16128 15616
## 1934410: 0 2 45064 17664 18176 15360 15360 18176 17664
## 1934411: 0 2 45064 16896 16128 14336 14336 16128 16896
## 1934412: 0 2 45064 16384 16384 13312 13312 16384 16384
## class : LAS (v1.4 format 7)
## memory : 236.1 Mb
## extent : 574448.3, 574930.6, 6225346, 6225733 (xmin, xmax, ymin, ymax)
## coord. ref. : NA
## area : 185488 units²
## points : 1.93 million points
## type : airborne
## density : 10.43 points/units²
## density : 10.43 pulses/units²
## File signature: LASF
## File source ID: 0
## Global encoding:
## - GPS Time Type: GPS Week Time
## - Synthetic Return Numbers: no
## - Well Know Text: CRS is GeoTIFF
## - Aggregate Model: false
## Project ID - GUID: 00000000-0000-0000-0000-000000000000
## Version: 1.4
## System identifier:
## Generating software:
## File creation d/y: 0/0
## header size: 375
## Offset to point data: 1005
## Num. var. length record: 1
## Point data format: 7
## Point data record length: 48
## Num. of point records: 1934412
## Num. of points by return: 1934412 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## Scale factor X Y Z: 0.01 0.01 0.01
## Offset X Y Z: 0 0 0
## min X Y Z: 574448.3 6225346 -1.37
## max X Y Z: 574930.6 6225733 92.88
## Variable Length Records (VLR):
## Variable Length Record 1 of 1
## Description: by LAStools of rapidlasso GmbH
## Extra Bytes Description:
## Blue: additional attributes
## Green: additional attributes
## Red: additional attributes
## Extended Variable Length Records (EVLR): void
Question 2: What information is stored in the @header vs @data?
- The @header contains summary information about the point cloud, not the points themselves. Metadata like extent, CRS, scale/offset, point format, counts - The @data is a data.frame-like object where each row is one LiDAR point. Actual point-level attributes (X, Y, Z, intensity, classification)
## [1] 1934412
## [1] 1934412
## [1] 1934412
Question 3: How many points does this LiDAR dataset contain?
1934412
The Z coordinate represents elevation. Let’s explore its distribution.
# Basic histogram using base R
hist(Unipark$Z,
xlab = "Z",
main = "Distribution of Elevation Values",
col = "skyblue",
border = "white"
)# Create a data frame for ggplot
uni_df <- data.frame(
x = Unipark$X,
y = Unipark$Y,
z = Unipark$Z
)
# Advanced histogram using ggplot2
ggplot(data = uni_df, aes(x = z)) +
geom_histogram(fill = "steelblue",
color = "white",
bins = 50) +
labs(
title = "Distribution of Z Values",
x = "Elevation (m)",
y = "Count"
) +
theme_minimal()## [1] -1.37 92.88
Question 4: What is the range of Z values in the
dataset? (Check with range(Unipark$Z)) -1.37
92.88
Question 5: Can we directly use these Z values to determine vegetation height? Why or why not? no - the Z values alone tell us only how high each point is in the landscape, not the actual vegetation height. To get vegetation height, we need height above ground, not height above sea level
Note: The plot() function creates interactive 3D
visualizations of point clouds. However, this only works if you run R
and RStudio locally on your computer, since Posit Cloud does not have
the graphics engine to display these 3D plots. Alternatively, you can
plot 2D transects through the point cloud with ggplot2.
Tip: Use your mouse to rotate, zoom, and pan the 3D view!
Alternative plot of a point cloud transect with ggplot:
# Create a LiDAR transect
p1 <- c(574500, 6225500) # define starting point of transect
p2 <- c(574700, 6225500) # define ending point of transect
las_tr <- clip_transect(Unipark, p1, p2, width = 20, xz = TRUE)
# plot transect with ggplot with x on the x-axis, elevation on the y-axis and colored by elevation
ggplot(payload(las_tr), aes(X,Z, color = Z)) +
geom_point(size = 0.1) +
coord_equal() +
theme_minimal() +
scale_color_viridis_c()Many LiDAR datasets include RGB color information captured during flight.
Alternative plot of a point cloud transect with ggplot:
# Create a smaller LiDAR transect
p1 <- c(574500, 6225500)
p2 <- c(574700, 6225500)
las_tr2 <- clip_transect(Unipark, p1, p2, width = 5, xz = TRUE)
# plot transect with ggplot and RGB colors
color_plot <- rgb(las_tr2$R/65535, las_tr2$G/65535, las_tr2$B/65535) # generate RGB colors; divide by max 16-bit value to scale from 0-1
ggplot(payload(las_tr2), aes(X, Z, color = color_plot)) +
geom_point(size = 0.3) +
coord_equal() +
theme_minimal() +
theme(legend.position = "none") +
scale_color_manual(values = color_plot)LiDAR points are classified according to ASPRS (American Society for Photogrammetry & Remote Sensing) standards.
| Code | Class | Description |
|---|---|---|
| 1 | Unclassified | Never classified |
| 2 | Ground | Bare earth and terrain |
| 3 | Low Vegetation | < 0.5 m height |
| 4 | Medium Vegetation | 0.5 - 2 m height |
| 5 | High Vegetation | > 2 m height |
| 6 | Building | Structures |
| 7 | Low Point (Noise) | Outliers |
| 9 | Water | Water surfaces |
| 18 | High Noise | Aerial noise |
Reference: ASPRS LAS 1.4 Specification
# Colorize by classification
# plot(Unipark, # use for 3D plots on your local computer
# color = Classification,
# colorPalette = c("yellow", "lightgreen", "green",
# "darkgreen", "grey", "darkgrey",
# "blue", "lightblue", "white")
# ) Alternative plot of a point cloud transect with ggplot:
# plot transect with ggplot and Classification colors
class <- factor(las_tr2$Classification) # store classification as discrete factors so that discrete colors can be assigned
ggplot(payload(las_tr2), aes(X,Z, color = class) ) +
geom_point(size = 0.3) +
coord_equal() +
theme_minimal() +
scale_color_manual(values = c("grey", "yellow", "lightgreen", "green",
"darkgreen", "darkgrey","white","lightblue" ))Question 6: Which colors represent vegetation vs. buildings in the plot? green = vegetation yellow = ground grey = buildings
## [1] 2 5 4 3 6 7 1 19 20 18 9
##
## 1 2 3 4 5 6 7 9 18 19
## 30430 1037484 11481 27855 603276 209873 2224 5246 2022 4386
## 20
## 135
The filter_poi() function (Points Of Interest) filters
point clouds based on conditions.
# Filter ground points (Class 2)
ground_points <- filter_poi(Unipark, Classification == 2)
subset_i <- seq(0,length(ground_points$X),10) # use every 10th point
ggplot(payload(ground_points[subset_i]), aes(X,Y, color = Z )) +
geom_point(size = 0.1) +
coord_equal() +
theme_minimal() +
scale_color_viridis_c()# Filter low vegetation (Class 3)
low_veg <- filter_poi(Unipark, Classification == 3)
ggplot(payload(low_veg), aes(X,Y, color = Z )) +
geom_point(size = 0.1) +
coord_equal() +
theme_minimal() +
scale_color_viridis_c()# Filter medium vegetation (Class 4)
med_veg <- filter_poi(Unipark, Classification == 4)
ggplot(payload(med_veg), aes(X,Y, color = Z )) +
geom_point(size = 0.1) +
coord_equal() +
theme_minimal() +
scale_color_viridis_c()# Filter high vegetation (Class 5)
high_veg <- filter_poi(Unipark, Classification == 5)
subset_i <- seq(0,length(high_veg$X),5) # use every 10th point
ggplot(payload(high_veg[subset_i]), aes(X,Y, color = Z )) +
geom_point(size = 0.1) +
coord_equal() +
theme_minimal() +
scale_color_viridis_c()# Filter buildings (Class 6)
buildings <- filter_poi(Unipark, Classification == 6)
ggplot(payload(buildings), aes(X,Y, color = Z )) +
geom_point(size = 0.1) +
coord_equal() +
theme_minimal() +
scale_color_viridis_c()# Filter water (Class 9)
water <- filter_poi(Unipark, Classification == 9)
ggplot(payload(water), aes(X,Y, color = Z )) +
geom_point(size = 0.1) +
coord_equal() +
theme_minimal() +
scale_color_viridis_c()## [1] 1037484
Question 7: How many points are classified as
ground? (Use nrow(ground_points)) 1037484
Question 8: Which classification contains the most points? class 2, 1037484
Raw Z values represent elevation above sea level, not height above ground. To measure vegetation height, we need to normalize the point cloud by subtracting the ground elevation.
Vegetation Height = Z value - Ground Elevation
A DTM represents the bare earth surface. We’ll use a Triangulated Irregular Network (TIN) algorithm.
# Generate DTM using TIN algorithm
dtm <- grid_terrain(Unipark,
algorithm = tin(),
use_class = c(2L, 9L), # use ground (2) and water (9) points
res = 1) # 1-meter resolution
# Plot the DTM
plot(dtm,
main = "Digital Terrain Model (DTM)",
col = terrain.colors(50)
)Question 9: What does the DTM represent? the DTM represents the elevation of the actual ground surface, with vegetation and structures removed
The normalize_height() function subtracts ground
elevation from each point’s Z value.
# Normalize heights using the DTM
norm_height <- normalize_height(Unipark, dtm)
# Compare original vs. normalized heights
hist(Unipark$Z,
main = "Original Z Values",
xlab = "Elevation (m)",
col = "coral"
)hist(norm_height$Z,
main = "Normalized Heights",
xlab = "Height above ground (m)",
col = "forestgreen"
)## [1] -1.37 92.88
## [1] -41.41 48.59
Question 10: What is the range of normalized heights? How does this compare to the original Z values? -41.41 48.59 The original Z values are higher, because it includes both terrain elevation and object height. The normalized are smaller, because the terrain component has been removed. the high negative values must be misclassified or noisy points
Now let’s extract only vegetation points (classes 3, 4, 5) with meaningful heights.
# Filter vegetation: classes 3, 4, 5, and height > 0.5m
vegetation <- filter_poi(
norm_height,
Classification %in% c(3, 4, 5) & Z >= 0.5
)
# Visualize vegetation only
# Create a LiDAR transect
p1 <- c(574500, 6225500)
p2 <- c(574700, 6225500)
las_tr_veg <- clip_transect(vegetation, p1, p2, width = 20, xz = TRUE)
# plot transect with ggplot
ggplot(payload(las_tr_veg), aes(X,Z, color = Z)) +
geom_point(size = 0.1) +
coord_equal() +
theme_minimal() +
scale_color_viridis_c()## [1] 627587
Question 11: How many vegetation points remain after filtering? 627587
A CHM represents the height of the vegetation canopy above ground.
# Calculate CHM using 95th percentile of heights in each pixel
canopy_height <- grid_metrics(
vegetation,
func = quantile(Z, probs = 0.95), # 95th percentile
res = 1 # 1-meter resolution
)
# Plot the CHM
plot(canopy_height,
main = "Canopy Height Model (CHM)",
col = colorRampPalette(c("yellow", "green", "darkgreen"))(50)
)Question 12: Why use the 95th percentile instead of the maximum height? the 95th percentile is used instead of the maximum height because the maximum is very sensitive to noise, outliers, and stray LiDAR returns. If you take the maximum, a single noisy point can cause a pixel to incorrectly show a canopy height way too tall
Let’s identify areas with trees taller than 20 meters.
# Create a mask for trees > 20m
tall_tree_mask <- canopy_height
tall_tree_mask[tall_tree_mask < 20] <- NA
# Apply mask to CHM
tall_trees <- mask(canopy_height, tall_tree_mask) # use function from raster package
# Plot only tall trees
plot(tall_trees,
main = "Trees Taller than 20 meters",
col = colorRampPalette(c("orange", "red", "darkred"))(50)
)Question 13: Where are the tallest trees located in the park? (Describe geographically: north/south/east/west) mainly located in the north-central and eastern parts of the park
Vegetation structure can be characterized using various height percentiles and summary statistics.
# Calculate mean height in 1m grid cells
h_mean <- grid_metrics(
vegetation,
func = mean(Z),
res = 1
)
plot(h_mean,
main = "Mean Vegetation Height",
col = terrain.colors(50)
)Percentiles describe the distribution of heights in each grid cell.
# 25th percentile (1st quartile)
h_25p <- grid_metrics(
vegetation,
func = quantile(Z, probs = 0.25),
res = 1
)
plot(h_25p, main = "25th Percentile Height")# 50th percentile (median)
h_50p <- grid_metrics(
vegetation,
func = quantile(Z, probs = 0.5),
res = 1
)
plot(h_50p, main = "Median Height")# 75th percentile (3rd quartile)
h_75p <- grid_metrics(
vegetation,
func = quantile(Z, probs = 0.75),
res = 1
)
plot(h_75p, main = "75th Percentile Height")# 95th percentile (used for CHM)
h_95p <- grid_metrics(
vegetation,
func = quantile(Z, probs = 0.95),
res = 1
)
plot(h_95p, main = "95th Percentile Height")Question 14: What does the 25th percentile height tell us about vegetation structure? it represents the height below which 25% of all vegetation points in a grid cell occur
Question 15: Which percentile (25th, 50th, 75th, or 95th) best represents the dominant canopy height? The 95th percentile (H95) best represents the dominant canopy height, because it captures the upper canopy surface
The pulse penetration ratio measures how many laser pulses reached the ground, indicating canopy openness.
Formula: Pulse Penetration Ratio = (Ground Points) / (Total Points)
# Define function to calculate ratio
calc_ratio <- function(z_ground, z_total) {
ratio <- z_ground / z_total
return(ratio)
}
# Calculate at 1m resolution
pulse_pen_1m <- grid_metrics(
norm_height,
func = calc_ratio(
length(Z[Classification == 2]), # Ground points
length(Z) # Total points
),
res = 1
)
plot(pulse_pen_1m,
main = "Pulse Penetration Ratio (1m)",
col = colorRampPalette(c("darkgreen", "yellow"))(50)
)Question 16: What do high vs. low pulse penetration values indicate? High values => open and sparse vegetation; low values => dense and closed canopy
Count points in different height layers to understand vertical structure.
# 0-1 meter layer (understory)
layer_0_1m <- grid_metrics(
norm_height,
func = sum(Z >= 0 & Z <= 1),
res = 1
)
plot(layer_0_1m, main = "Point Density: 0-1m")# 1-5 meter layer (shrub layer)
layer_1_5m <- grid_metrics(
norm_height,
func = sum(Z >= 1 & Z <= 5),
res = 1
)
plot(layer_1_5m, main = "Point Density: 1-5m")# 5-10 meter layer (sub-canopy)
layer_5_10m <- grid_metrics(
norm_height,
func = sum(Z >= 5 & Z <= 10),
res = 1
)
plot(layer_5_10m, main = "Point Density: 5-10m")# 10-50 meter layer (canopy)
layer_10_50m <- grid_metrics(
norm_height,
func = sum(Z >= 10 & Z <= 50),
res = 1
)
plot(layer_10_50m, main = "Point Density: 10-50m")Question 17: Which height layer has the most points? What does this tell you about the vegetation structure? 0-1 meter layer (understory). lots of grass
Calculating metrics at different resolutions reveals patterns at different spatial scales.
# Pulse penetration at 1m resolution (already calculated)
plot(pulse_pen_1m, main = "Pulse Penetration: 1m Resolution")# Pulse penetration at 5m resolution
pulse_pen_5m <- grid_metrics(
norm_height,
func = calc_ratio(
length(Z[Classification == 2]),
length(Z)
),
res = 5 # 5-meter resolution
)
plot(pulse_pen_5m, main = "Pulse Penetration: 5m Resolution")# Pulse penetration at 10m resolution
pulse_pen_10m <- grid_metrics(
norm_height,
func = calc_ratio(
length(Z[Classification == 2]),
length(Z)
),
res = 10
)
plot(pulse_pen_10m, main = "Pulse Penetration: 10m Resolution")Question 18: How does the pattern change as you increase the resolution from 1m to 10m? 1m resolution = more details 10m resolution = coarser, landscape-scale patterns
Question 19: Which resolution is most appropriate for: - Detailed tree-level analysis? 1m - Landscape-level patterns? 10m
You can calculate multiple metrics simultaneously using custom functions.
# Define a function to calculate multiple metrics
my_metrics <- function(z) {
list(
mean_height = mean(z),
max_height = max(z),
sd_height = sd(z),
cv_height = sd(z) / mean(z) * 100, # Coefficient of variation
n_points = length(z)
)
}
# Calculate all metrics at once
all_metrics <- grid_metrics(
vegetation,
func = ~my_metrics(Z),
res = 5
)
# Plot each metric
plot(all_metrics, "mean_height", main = "Mean Height (5m)")